0 bookmark(s) - Sort by: Date ↓ / Title /
This repository provides an overview of resources for the paper 's1: Simple test-time scaling', which includes minimal recipes for test-time scaling and strong reasoning performance. It covers artifacts, structure, inference, training, evaluation, data, visuals, and citation details.
The article explores the DeepSeek-R1 models, focusing on how reinforcement learning (RL) is used to develop advanced reasoning capabilities in AI. It discusses the DeepSeek-R1-Zero model, which learns reasoning without supervised fine-tuning, and the DeepSeek-R1 model, which combines RL with a small amount of supervised data for improved performance. The article highlights the use of distillation to transfer reasoning patterns to smaller models and addresses challenges and future directions in RL for AI.
First / Previous / Next / Last
/ Page 1 of 0